-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a HNSW collector that exits early when nearest neighbor queue saturates #14094
base: main
Are you sure you want to change the base?
Conversation
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java
Outdated
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/search/HnswQueueSaturationCollectorTest.java
Outdated
Show resolved
Hide resolved
public interface HnswKnnCollector extends KnnCollector { | ||
|
||
/** Indicates exploration of the next HNSW candidate graph node. */ | ||
void nextCandidate(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this kind of collector is OK. But it makes most sense to me to be a delegate collector. An abstract collector to KnnCollector.Delegate
.
Then, I also think that the OrdinalTranslatingKnnCollector
should inherit directly from HnswKnnCollector
always assuming that the passed in collector is a HnswKnnCollector
.
Note, the default behavior for HnswKnnCollector#nextCandidate
can simply be nothing, allowing for overriding.
This might require a new HnswGraphSearcher#search
interface to keep the old collector actions, but it can be simple to add a new one that accepts a HnswKnnCollector
and delegate to it with new HnswKnnCollector(KnnCollector delegate)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted my refactoring for the seeded queries similarly. It seems nicer IMO: #14170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Ben. I'll incorporate your suggestions once #14170 is in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made HnswKnnCollector
a KnnCollector.Decorator
in c6dbf7e
updated results (Cohere-768, 200k docs, merge disabled) baseline
candidate
|
reference paper |
I've updated this and moved the early termination logic not to kick in by default but to be based on a (wrapping) |
updated maxconn=32baseline
candidate@default
candidate@sat=0.995,patience=maxconn(32)
candidate@sat=0.95,patience=fanout(50)
candidate@sat=0.995,patience=fanout(50)
maxconn=64baseline
candidate@default
candidate@sat=0.995,patience=maxconn(64)
candidate@sat=0.95,patience=fanout(50)
candidate@sat=0.995,patience=fanout(50)
|
super(collector); | ||
} | ||
|
||
/** Indicates exploration of the next HNSW candidate graph node. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggest "triggers" instead of "indicates"
|
||
@Override | ||
public String toString(String field) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add some name here, maybe in the future it will be easier to debug?
This introduces a
HnswKnnCollector
interface, extendingKnnCollector
for HNSW, to make it possible to hook into HNSW execution for optimizations.It then adds a new collector which uses a saturation-based threshold to dynamically halt HNSW graph exploration, in order to early exit when the exploration of new candidates is unlikely to lead to addition of new neighbors.
The new collector records the number of added neighbors upon exploration of a new candidate (a HNSW node) and it compares it with the number of neighbors added while exploring the previous candidate, when the rate of added neighbors plateaus for a number of consecutive iterations, it stops graph exploration (
earlyTerminate
returnstrue
).